In digital image processing and computer vision, semantic segmentation is the process of assigning a label to every pixel in an image such that pixels with the same label share common characteristics. Semantic segmentation of magnetic resonance (MR) images is implemented for this task based on deep neural networks.
The dataset in this task consists of 200 grayscale intensity images at 96 × 96 resolution and 120 corresponding masks. To be specific, 100 for training (with masks), 20 for validation (with masks) and 80 for testing (without masks). The objective is to segment the original images into 4 classes pixelwise including the background (label: 0), the left ventricle (LV, label: 3), the right ventricle (RV, label: 1) and the myocardium (Myo, label: 2).
To begin with, we dived into several different network architectures to compete their performance and generalisation capicity. By comparing these models, the optimal one for this task is applied on the test set. This process can provide us a direction for optimisation, as it could help us to know more about segmentation networks and optimization approaches in various papers.
The final performance of model is measured by an average Dice Score computed on Kaggle based on 80 test cases, which ranges from 0 to 1 (the higher, the better). We aim to achieve 0.85 in terms of average Dice Score.
Some efficient architectures for semantic segmentation have been selected, which are Fully Convolutional Networks (Long et al, 2015), SegNet (Badrinarayanan et al., 2017), DeepLab (Chen at al., 2017; Chen at al., 2018) and UNet (Ronneberger et al., 2015; Hang et al., 2020) and their variations.
Fully Convolutional Networks (FCN), as the first end to end fully convolutional networks, can solve the problem of missed spatial information taking input of fixed size (Long et al., 2015). However, it is less sensitive to details in the image and leads to coarse segmentation maps. Therefore, more work is needed to achieve better performance.
SegNet can improve the segmentation resolution by using encoder and decoder component in the architecture and raise memory efficiency by using pooling indices when doing upsampling (Badrinarayanan et al., 2017). We compared an advanced version consisting of the first 13 layers in VGG16 network for both encoder an decoder by a basic version of 4 convolutional layers for both encoder and decoder.
DeepLab (Chen et al., 2017) combines the advantages of spatial pyramid pooling module and encode-decoder structure. A novel version of it (DeepLab v3+) makes some changes to its backbone, such as using Resnet or Xception, which applies the depthwise separable convolution to both Atrous Spatial Pyramid Pooling and decoder blocks (Chen et al., 2018). In UNet (Ronneberger et al., 2015) and its later version (UNet 3+), the combination of semantic information in deep, coarse layers and appearance information in shallow, fine layers is realized by plain skip connection (fully skip connection and fully deep supervisions in U-Net 3+ (Hang et al., 2020)); thus better performance can be achieved on medical image segmentation.
For the architecture combining U-Net and U-Net 3+ which gets the best results on the test set, we would focus more on these networks when explaining our implementation in details.
Fig. 1. Architectures of UNet (a) and UNet 3+ (b).
In our bagging model, two networks are ultilised to produce a final weighted output of the results produced by UNet and UNet 3+ (shown in Fig.1). In this achitecture, a basic version of UNet 3+ is applied, which ignores the bilinear interpolation in the full-scale skip connections, the full-scale deep supervision and classification-guided module (CGM). Furthermore, the last one layer of UNet is removed.
Since both networks use encoder-decoder structure, two main works of building contracting path and expansive path are carried out.
In UNet architecture, each block in its contracting path is constituted by several classic convolutional layers (each followed by a rectified linear unit (ReLU)), ending with a downsampling step using max pooling operation. The method used in the encoding process of UNet 3+ is identical to that applied in UNet (except for the number of layers).
However, there are several differences on how they deal with the decoder. In the expansive path of UNet, each block consists of two components, which are a concatenation operation that makes use of the corresponding cropped feature map from the contracting path and a series of traditional operations (upsampling the feature map for concatenation, followed by several convolutional layers and ReLU). While in UNet 3+, it combines the multi-scale features in its skip connections, which is different from plain connections used in UNet. This full-scale skip connections incorporate information from both encoder and decoder itself to construct a feature map. By using the feature map from the same-scale encoder layer and the low-level detailed information from the smaller-scale encoder layer, and applying a max pooling operation, a batch normalization and a ReLU function, the final segmentation map is generated.
To apply the above networks to this segmentation task, a number of preparations are necessary, i.e., the implementation and configuration of our model candidates, the optimizer including the initialization of learning rate and weight decay, the loss function, and several other hyperparameters such as batch size, dropout, and maximum number of epochs.
Fig. 2. Implementation.
After loading training and validation data with both images and masks included, we begin our training process. In each epoch, we first set the gradients to zero, and then the model parameters are updated by the optimizer using a self-defined loss function that helps to decide every proper step we take (shown in Fig. 2). In order to investigate how the performance of our model, the time cost, the Dice Score and the average values of the self-defined loss of both training and validation are recorded, such that we are able to select a model that performs best among all the model candidates (that is bagging of UNet and UNet 3+ in our case). Furthermore, confusion matrix of each model can also be used to evaluate the accuracy of segmentation.
To provide more evidence to assess the generalization performance of the model, we run it on the test set containing 80 test data, and by computing the Dice Score, we have a more objective evaluation of our model.
Ranging from 0 to 1, the Dice Score evaluates the segmentation performance by comparing the predicted mask with the ground truth. As given in the instruction file (shown in Fig. 3), we compute the average Dice Score of training set, validation set and test set by computing the overlap area of our predicted output mask (X) and ground truth (Y). The average Dice Score for each example is computed as the average of the Dice Score of 3 labels other than the background (0).
Fig. 3. Dice Score
Our self-defined loss function is defined by combining cross-entropy loss and Dice Score (shown in Fig. 4). By calculating the average loss and Dice Score of training set and validation set after training each epoch, a gap can be noticed between two curves (curves on behalf of training set and validation set). Drawing on this gap, we can evaluate the generalization performance of a model.
Fig. 4.Self-defined loss function
To further evaluate the precision of our model, we calculate a confusion matrix in terms of 4 labels by comparing predcited labels with ground truth (shown in Fig. 5).
Fig. 5. Confusion matrix
By setting a timer at the begining and the end of a process with fixed dataset (training or inference), and calculating the size of the total parameters produced, we can compute the computational efficiency of a model.
Fig. 6. Random flipping horizontally
By randomly flipping our training images horizontally, we improve the capacity of extraction of features. An obvious improvement can be seen in the comparison of confusion matrix calculated before and after data argumentation (shown in Fig. 7). The average Dice Score of the validation set raise from 0.88459 to 0.89084.
Fig. 7. Confusion matrix : comparison_flipping
After downsampling through 5 layers in UNet, the size of our input (96 × 96) becomes quite small, therefore we tried removing the last two layers of the encoder. Meanwhile, we tried expanding the size of the input image (shown in Fig. 8) so that more information can be preserved in this process, but the output is not satisfying enough. The changes made to the contrast and brightness of the input image are as follows
Fig. 8. Size changing
As shown in the comparison of confusion matrix calculated before and after increasing the brightness of images with data argumentation (shown in Fig. 9), the performance has not been significantly improved, which does not meet our expectation (from 0.89084 Fig. 9(a) to 0.88920 Fig. 9(b)).
(a)
(b)
Fig. 9. Confusion matrix : comparison_size changing
This method is less satisfied when adapting to our models, which is very time-consuming to find the optima.
Fig. 10. AdamW
It updates the learning rate directly (shown in Fig. 11).
Fig. 11. Self-defined function to adapt learning rate
We update the learning rate according to the number of epochs utilizing a self-defined optimizer (shown in Fig. 12), which is able to reduce the effect of overshooting the minima in the optimisation process.
Fig. 12. Updating the learning rate
Fig. 13. Cross-entropy loss function
Since the final generalization performance is measured by a Dice Score on test set, we combine this Dice Score with cross-entropy loss together as a self-defined loss function to update our weights.
Since each network has gone through a number of upgrades (for example, UNet and DeepLab), different achitectures and approaches have been used in various versions. To better train a model to segment our input images, some adjustments are made to modify the structure of the networks we used.
For example, as mentioned in section 2.2, we modify the number of layers in UNet. As the size of our input images is 96 × 96 × 1, it is compressed to 6 × 6 × 1024 after four blocks of convolutions and maxpooling, which might lose information of some essential features. After removing the last one layer of the encoder and modify the decoder accordingly, we stop downsampling when the size reaches 12 × 12 × 512. The model can achieve better performance with this structure adopted.
A new output can be produced by combing the outputs (segmentation map) produced from two networks with an assigned weight respectively.(shown in Fig. 14).
Fig. 14. Bagging model
From the prospective of optimization, a fixed and popular progress is applied during optimization, including modification of loss function and optimizer updating. Moreover, work related to data pre-progressing and model restructuring is also a good aspect of optimisation.
For example, in the structure of bagging of UNet and UNet 3+, we aim to combine the ability to extract both semantic information and appearance information of these two networks. Making use of their strengths, we come out with this method to produce a new map.
Two more detailed processes of optimization is described as following (Taking DeepLab and SegNet as examples):
In DeepLab, ResNet and Xception are adopted as the backbone to optimize our segmentation separately.
Firstly, the effects of different output strides on the model are investigated by adopting the same loss function and optimizer in the Resnet model. We choose the parameters of output stride 8 and 16, thus finding that the former one is better at performance and its loss decreases faster during traing. Then we change other hyperparameters and use our self-defined function as below:
Using Adam with adaptive learning rate(self-defined) as optimizer,the maximum average Dice Score of the validaton set arrive at 0.74 before overfitting.
The model using Xception is optimized similarily, in which the self-defined loss function is defined below:
Before overfitting, the maximum average Dice Score of the validaton set can achieve 0.81 via Adam with learning rate 0.001.
Several optimization approaches is used to achieve better performance and minimize the extent of overfitting for SegNet, and the results are concluded in the table below.
It can be observed that when adopting the dropout layer and our self-defined loss with the optimizer Adam, the average Dice Score on the validation set can be improved to 0.8 (shown in bold in table).
Segnet-basic is a shallower version of Segnet, which adopts the first 13 layers of vgg16 for the encoder. Compared to Segnet, basic version (Segnet–basic) has less layers for its encoder and the same number of layers for the decoder. A light version is also designed with a smaller kernel size and padding in encoders and decoders.
Making the Segnet shallower and lighter can achieve better performance. However, it suffers severely from the problem of overfitting.
In conclusion, the fitting capacity and generalization capacity of Segnet, including Segnet-basic, is hard to make significant improvement. It best achieves Dice Score 0.852 with overfitting for this scenario, which indicates that Segnet may have some vital defects and is not optimal for this segmentation task.
Changes in loss and Dice Score in training process of networks under optimal settings ((a) and (b)). The confusion matrics (c) are acquired with optimal model parameters.
Given a task (in this case, it is a medical semantic segmentation task), we need to be very careful when selecting a network to build our model. In the papers proposing these networks, experiments evaluating performance of networks on representative public datasets are summarized and the suggested scenes to apply corresponding networks are mentioned.
For instance, SegNet achieves a better performance in road scenes and this might be the reason why it is not suitable for this medical segmentation task, while UNet and UNet 3+ themselves are more frequently adapted in medical scenes. One latent factor that reduces the performance and generalisation capicity of SegNet in this task may be its not passing the feature map in each block of the encoder to the decoder like UNet and UNet 3+, thus losing some vital information. Also, complex achitectures like Deeplab may not perform well and overfit easily in this relatively simple segmentationscenario.
In a word, before choosing a network for a specific task, the context of the task and the application scenes of different network need to be considered carefully.
When we try optimizing our training by data augumentation of the training set, an obvious improvement of labeling is made and can be seen from the corresponding confusion matrix (as described in Section 3.1), which validates the idea that the size of training set matters.
As for the validation set, a larger number of data might contribute to a better comparison among models, and help to achieve better generalisation capicity when deciding which epoch to stop training.
To evaluate the generalization performance of our models, we need to find a method to represent how well or how accurate the output of our model is on an unseen dataset, which is Dice Score on test set in our case.
The highest score achieved by our models is 0.89084. In addition, narrower gap between two cures (Section 3.2.2) can be observed for UNet and UNet 3+ during training.
Rather than using a single model to segment an image, we produce an output segmentation map by combining the predicted mask from 2 networks with different weights (as described in Section 2.2), so that we can combine and balance the advantages of these two networks. Thus, more combination is worth trying, such as DeepLab. Further, the weight for each network can also be learned.
The critical information contained in an image is usually around the centre (especially in our case). Therefore, we can reconstruct a model consisting of two parallel pathways to produce two segmentation maps and then combine those as a more precise output.
Some of the code comes from open source code on the network. We have partially modified it and applied it to this assignment. Only papers are listed in the Reference.
Other code we attached but didn't list here can be found in the GitHub.
Link : https://github.com/soda-bread/NC-Coursework-group-16
[1] Kervadec, H., Bouchtiba, J., Desrosiers, C., Granger, E., Dolz, J., & Ayed, I. B. (2019, May). Boundary loss for highly unbalanced segmentation. In International conference on medical imaging with deep learning (pp. 285-296). PMLR.
[2] Chen, L. C., Zhu, Y., Papandreou, G., Schroff, F., & Adam, H. (2018). Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European conference on computer vision (ECCV) (pp. 801-818).
[3] Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2017). Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE transactions on pattern analysis and machine intelligence, 40(4), 834-848.
[4]Lin, T. Y., Goyal, P., Girshick, R., He, K., & Dollár, P. (2017). Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision (pp. 2980-2988).
[5]Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3431-3440).
[6]Badrinarayanan, V., Kendall, A., & Cipolla, R. (2017). Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE transactions on pattern analysis and machine intelligence, 39(12), 2481-2495.
[7]Milletari, F., Navab, N., & Ahmadi, S. A. (2016, October). V-net: Fully convolutional neural networks for volumetric medical image segmentation. In 2016 fourth international conference on 3D vision (3DV) (pp. 565-571). IEEE.
[8]Huang, H., Lin, L., Tong, R., Hu, H., Zhang, Q., Iwamoto, Y., ... & Wu, J. (2020, May). Unet 3+: A full-scale connected unet for medical image segmentation. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 1055-1059). IEEE.
[9] Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
[10] Loshchilov, I., & Hutter, F. (2018). Fixing weight decay regularization in adam.
# Library
import os
import cv2
import torch
import torchvision
import random
import numpy
import time
import itertools
import numpy as np
import torch.nn as nn
import torch.nn.functional as F
import matplotlib.pyplot as plt
import torch.utils.data as data
from torch.utils.data import DataLoader
from PIL import Image
from glob import glob
from matplotlib import pyplot as plt
from sklearn import svm, datasets
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
# Visualisation test
def show_image_mask(img, mask, cmap = 'gray'):
fig = plt.figure(figsize=(5, 5))
plt.subplot(1, 2, 1)
plt.imshow(img, cmap = cmap)
plt.axis('off')
plt.subplot(1, 2, 2)
plt.imshow(mask, cmap = cmap)
plt.axis('off')
data_dir = './data/train'
image = cv2.imread(os.path.join(data_dir,'image','cmr1.png'), cv2.IMREAD_UNCHANGED)
mask = cv2.imread(os.path.join(data_dir,'mask','cmr1_mask.png'), cv2.IMREAD_UNCHANGED)
show_image_mask(image, mask, cmap='gray')
plt.pause(1)
cv2.imwrite(os.path.join('./','cmr1.png'), mask * 85)
True
# Check the device
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"The current device is {device}")
The current device is cuda:0
# Data Loader & Data Augmentation
classes = ['0','1','2','3']
class TrainDataset(data.Dataset):
def __init__(self, root=''):
super(TrainDataset, self).__init__()
self.img_files = glob(os.path.join(root,'image','*.png'))
self.mask_files = []
# Define random flip
for img_path in self.img_files:
basename = os.path.basename(img_path)
self.mask_files.append(os.path.join(root,'mask',basename[:-4]+'_mask.png'))
self.datatransform1 = torchvision.transforms.Compose([
torchvision.transforms.RandomHorizontalFlip(p = 1)
])
self.datatransform2 = torchvision.transforms.Compose([
torchvision.transforms.RandomHorizontalFlip(p = 0)
])
def __getitem__(self, index):
img_path = self.img_files[index]
mask_path = self.mask_files[index]
data = cv2.imread(img_path, cv2.IMREAD_UNCHANGED)
label = cv2.imread(mask_path, cv2.IMREAD_UNCHANGED)
# Data augmentation - random flip
image1 = Image.fromarray(data)
image2 = Image.fromarray(label)
# Define a random value op
op = random.randint(100007, 100000007)
op = op % 2
if op: # Flip when op is odd
image1 = self.datatransform1(image1)
image2 = self.datatransform1(image2)
else: # Flip when op is even
image1 = self.datatransform2(image1)
image2 = self.datatransform2(image2)
data = numpy.array(image1)
label = numpy.array(image2)
return torch.from_numpy(data).float(), torch.from_numpy(label).float()
def __len__(self):
return len(self.img_files)
class TestDataset(data.Dataset):
def __init__(self, root = ''):
super(TestDataset, self).__init__()
self.img_files = glob(os.path.join(root,'image','*.png'))
# sort by filename for submission
self.img_files.sort(key=lambda x:int(x[-7:-4]))
def __getitem__(self, index):
img_path = self.img_files[index]
data = cv2.imread(img_path, cv2.IMREAD_UNCHANGED)
return torch.from_numpy(data).float()
def __len__(self):
return len(self.img_files)
# UNet 3+
class unetConv2(nn.Module):
# (convolution => [BN] => ReLU) * 2
#Define a class contains two steps of convolution
def __init__(self, in_channels, out_channels, is_batchnorm):
super().__init__()
self.double_conv = nn.Sequential(
nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1),
nn.BatchNorm2d(out_channels),
nn.ReLU(inplace = True),
nn.Conv2d(out_channels, out_channels, kernel_size=3, padding=1),
nn.BatchNorm2d(out_channels),
nn.ReLU(inplace = True)
)
def forward(self, x):
return self.double_conv(x)
#Define the unet_3plus network
class UNet_3Plus(nn.Module):
def __init__(self):
super(UNet_3Plus, self).__init__()
in_channels = 1
n_classes = 4
feature_scale = 4
is_deconv = True
is_batchnorm = True
self.is_deconv = is_deconv
self.in_channels = in_channels
self.is_batchnorm = is_batchnorm
self.feature_scale = feature_scale
#Define a list contains all the channels used in the encoder and decoder
filters = [64, 128, 256, 512, 1024]
## -------------Encoder--------------
# define five steps of encoding and each step reduce half of both the width and length of pictures and double the original channels
self.conv1 = unetConv2(self.in_channels, filters[0], self.is_batchnorm)
self.maxpool1 = nn.MaxPool2d(kernel_size=2)
self.conv2 = unetConv2(filters[0], filters[1], self.is_batchnorm)
self.maxpool2 = nn.MaxPool2d(kernel_size=2)
self.conv3 = unetConv2(filters[1], filters[2], self.is_batchnorm)
self.maxpool3 = nn.MaxPool2d(kernel_size=2)
self.conv4 = unetConv2(filters[2], filters[3], self.is_batchnorm)
self.maxpool4 = nn.MaxPool2d(kernel_size=2)
self.conv5 = unetConv2(filters[3], filters[4], self.is_batchnorm)
## -------------Decoder--------------
#catBlocks is 5 because when we do upsample, it will merge features of 5 states and each feature has 64 channels
self.CatChannels = filters[0]
self.CatBlocks = 5
self.UpChannels = self.CatChannels * self.CatBlocks
'''stage 4d'''
# h1->96*96 hd4->12*12 pooling 8 times to do the downsampling
self.h1_PT_hd4 = nn.MaxPool2d(8, 8, ceil_mode=True)
self.h1_PT_hd4_conv = nn.Conv2d(filters[0], self.CatChannels, 3, padding=1)
self.h1_PT_hd4_bn = nn.BatchNorm2d(self.CatChannels)
self.h1_PT_hd4_relu = nn.ReLU(inplace=True)
# h2->48*48 hd4->12*12 pooling 4 times to do the downsampling
self.h2_PT_hd4 = nn.MaxPool2d(4, 4, ceil_mode=True)
self.h2_PT_hd4_conv = nn.Conv2d(filters[1], self.CatChannels, 3, padding=1)
self.h2_PT_hd4_bn = nn.BatchNorm2d(self.CatChannels)
self.h2_PT_hd4_relu = nn.ReLU(inplace=True)
#h3->24*24 hd4->12*12 pooling 2 times to do the downsampling
self.h3_PT_hd4 = nn.MaxPool2d(2, 2, ceil_mode=True)
self.h3_PT_hd4_conv = nn.Conv2d(filters[2], self.CatChannels, 3, padding=1)
self.h3_PT_hd4_bn = nn.BatchNorm2d(self.CatChannels)
self.h3_PT_hd4_relu = nn.ReLU(inplace=True)
# h4->12*12, hd4->12*12, Concatenation
self.h4_Cat_hd4_conv = nn.Conv2d(filters[3], self.CatChannels, 3, padding=1)
self.h4_Cat_hd4_bn = nn.BatchNorm2d(self.CatChannels)
self.h4_Cat_hd4_relu = nn.ReLU(inplace=True)
# h5->6*6, hd4->12*12, Upsample 2 times
self.hd5_UT_hd4 = nn.Upsample(scale_factor=2, mode='bilinear')
self.hd5_UT_hd4_conv = nn.Conv2d(filters[4], self.CatChannels, 3, padding=1)
self.hd5_UT_hd4_bn = nn.BatchNorm2d(self.CatChannels)
self.hd5_UT_hd4_relu = nn.ReLU(inplace=True)
# fusion(h1_PT_hd4, h2_PT_hd4, h3_PT_hd4, h4_Cat_hd4, hd5_UT_hd4)
# when finish the concatenation, do a convolution of hd4
# and it will merge all the features of 5 ceils and its channels will be 5*64 = 320
self.conv4d_1 = nn.Conv2d(self.UpChannels, self.UpChannels, 3, padding=1)
self.bn4d_1 = nn.BatchNorm2d(self.UpChannels)
self.relu4d_1 = nn.ReLU(inplace=True)
'''stage 3d'''
# h1->96*96, hd3->24*24, pooling 4 times to do the downsampling
self.h1_PT_hd3 = nn.MaxPool2d(4, 4, ceil_mode=True)
self.h1_PT_hd3_conv = nn.Conv2d(filters[0], self.CatChannels, 3, padding=1)
self.h1_PT_hd3_bn = nn.BatchNorm2d(self.CatChannels)
self.h1_PT_hd3_relu = nn.ReLU(inplace=True)
# h2->48*48, hd3->24*24, pooling 4 times to do the downsampling
self.h2_PT_hd3 = nn.MaxPool2d(2, 2, ceil_mode=True)
self.h2_PT_hd3_conv = nn.Conv2d(filters[1], self.CatChannels, 3, padding=1)
self.h2_PT_hd3_bn = nn.BatchNorm2d(self.CatChannels)
self.h2_PT_hd3_relu = nn.ReLU(inplace=True)
# h3->24*24, hd3->24*24, Concatenation
self.h3_Cat_hd3_conv = nn.Conv2d(filters[2], self.CatChannels, 3, padding=1)
self.h3_Cat_hd3_bn = nn.BatchNorm2d(self.CatChannels)
self.h3_Cat_hd3_relu = nn.ReLU(inplace=True)
# hd4->12*12, hd3->24*24, Upsample 2 times
self.hd4_UT_hd3 = nn.Upsample(scale_factor=2, mode='bilinear')
self.hd4_UT_hd3_conv = nn.Conv2d(self.UpChannels, self.CatChannels, 3, padding=1)
self.hd4_UT_hd3_bn = nn.BatchNorm2d(self.CatChannels)
self.hd4_UT_hd3_relu = nn.ReLU(inplace=True)
# hd5->6*6, hd3->24*24, Upsample 4 times
self.hd5_UT_hd3 = nn.Upsample(scale_factor=4, mode='bilinear')
self.hd5_UT_hd3_conv = nn.Conv2d(filters[4], self.CatChannels, 3, padding=1)
self.hd5_UT_hd3_bn = nn.BatchNorm2d(self.CatChannels)
self.hd5_UT_hd3_relu = nn.ReLU(inplace=True)
# fusion(h1_PT_hd3, h2_PT_hd3, h3_Cat_hd3, hd4_UT_hd3, hd5_UT_hd3)
# when finish the concatenation, do a convolution of hd3
# and it will merge all the features of 5 ceils and its channels will be 5*64 = 320
self.conv3d_1 = nn.Conv2d(self.UpChannels, self.UpChannels, 3, padding=1)
self.bn3d_1 = nn.BatchNorm2d(self.UpChannels)
self.relu3d_1 = nn.ReLU(inplace=True)
'''stage 2d '''
# h1->96*96, hd2->48*48, Pooling 2 times
self.h1_PT_hd2 = nn.MaxPool2d(2, 2, ceil_mode=True)
self.h1_PT_hd2_conv = nn.Conv2d(filters[0], self.CatChannels, 3, padding=1)
self.h1_PT_hd2_bn = nn.BatchNorm2d(self.CatChannels)
self.h1_PT_hd2_relu = nn.ReLU(inplace=True)
# h2->48*48, hd2->48*48, Concatenation
self.h2_Cat_hd2_conv = nn.Conv2d(filters[1], self.CatChannels, 3, padding=1)
self.h2_Cat_hd2_bn = nn.BatchNorm2d(self.CatChannels)
self.h2_Cat_hd2_relu = nn.ReLU(inplace=True)
# hd3->24*24, hd2->48*48, Upsample 2 times
self.hd3_UT_hd2 = nn.Upsample(scale_factor=2, mode='bilinear')
self.hd3_UT_hd2_conv = nn.Conv2d(self.UpChannels, self.CatChannels, 3, padding=1)
self.hd3_UT_hd2_bn = nn.BatchNorm2d(self.CatChannels)
self.hd3_UT_hd2_relu = nn.ReLU(inplace=True)
# hd4->12*12, hd2->48*48, Upsample 4 times
self.hd4_UT_hd2 = nn.Upsample(scale_factor=4, mode='bilinear')
self.hd4_UT_hd2_conv = nn.Conv2d(self.UpChannels, self.CatChannels, 3, padding=1)
self.hd4_UT_hd2_bn = nn.BatchNorm2d(self.CatChannels)
self.hd4_UT_hd2_relu = nn.ReLU(inplace=True)
# hd5->6*6, hd2->48*48, Upsample 8 times
self.hd5_UT_hd2 = nn.Upsample(scale_factor=8, mode='bilinear')
self.hd5_UT_hd2_conv = nn.Conv2d(filters[4], self.CatChannels, 3, padding=1)
self.hd5_UT_hd2_bn = nn.BatchNorm2d(self.CatChannels)
self.hd5_UT_hd2_relu = nn.ReLU(inplace=True)
# fusion(h1_PT_hd2, h2_Cat_hd2, hd3_UT_hd2, hd4_UT_hd2, hd5_UT_hd2)
# when finish the concatenation, do a convolution of hd2
# and it will merge all the features of 5 ceils and its channels will be 5*64 = 320
self.conv2d_1 = nn.Conv2d(self.UpChannels, self.UpChannels, 3, padding=1)
self.bn2d_1 = nn.BatchNorm2d(self.UpChannels)
self.relu2d_1 = nn.ReLU(inplace=True)
'''stage 1d'''
# h1->96*96, hd1->96*96, Concatenation
self.h1_Cat_hd1_conv = nn.Conv2d(filters[0], self.CatChannels, 3, padding=1)
self.h1_Cat_hd1_bn = nn.BatchNorm2d(self.CatChannels)
self.h1_Cat_hd1_relu = nn.ReLU(inplace=True)
# hd2->48*48, hd1->96*96, Upsample 2 times
self.hd2_UT_hd1 = nn.Upsample(scale_factor=2, mode='bilinear')
self.hd2_UT_hd1_conv = nn.Conv2d(self.UpChannels, self.CatChannels, 3, padding=1)
self.hd2_UT_hd1_bn = nn.BatchNorm2d(self.CatChannels)
self.hd2_UT_hd1_relu = nn.ReLU(inplace=True)
# hd3->24*24, hd1->96*96, Upsample 4 times
self.hd3_UT_hd1 = nn.Upsample(scale_factor=4, mode='bilinear')
self.hd3_UT_hd1_conv = nn.Conv2d(self.UpChannels, self.CatChannels, 3, padding=1)
self.hd3_UT_hd1_bn = nn.BatchNorm2d(self.CatChannels)
self.hd3_UT_hd1_relu = nn.ReLU(inplace=True)
# hd4->12*12, hd1->96*96, Upsample 8 times
self.hd4_UT_hd1 = nn.Upsample(scale_factor=8, mode='bilinear')
self.hd4_UT_hd1_conv = nn.Conv2d(self.UpChannels, self.CatChannels, 3, padding=1)
self.hd4_UT_hd1_bn = nn.BatchNorm2d(self.CatChannels)
self.hd4_UT_hd1_relu = nn.ReLU(inplace=True)
# hd5->6*6, hd1->96*96, Upsample 16 times
self.hd5_UT_hd1 = nn.Upsample(scale_factor=16, mode='bilinear')
self.hd5_UT_hd1_conv = nn.Conv2d(filters[4], self.CatChannels, 3, padding=1)
self.hd5_UT_hd1_bn = nn.BatchNorm2d(self.CatChannels)
self.hd5_UT_hd1_relu = nn.ReLU(inplace=True)
# fusion(h1_Cat_hd1, hd2_UT_hd1, hd3_UT_hd1, hd4_UT_hd1, hd5_UT_hd1)
# when finish the concatenation, do a convolution of hd1
# and it will merge all the features of 5 ceils and its channels will be 5*64 = 320
self.conv1d_1 = nn.Conv2d(self.UpChannels, self.UpChannels, 3, padding=1)
self.bn1d_1 = nn.BatchNorm2d(self.UpChannels)
self.relu1d_1 = nn.ReLU(inplace=True)
# output
# a covolution function convert the channels from 320 to 4
self.outconv1 = nn.Conv2d(self.UpChannels, n_classes, 3, padding=1)
def forward(self, inputs):
## -------------Encoder-------------
h1 = self.conv1(inputs) # h1->96*96*64
h2 = self.maxpool1(h1)
h2 = self.conv2(h2) # h2->48*48*128
h3 = self.maxpool2(h2)
h3 = self.conv3(h3) # h3->24*24*256
h4 = self.maxpool3(h3)
h4 = self.conv4(h4) # h4->12*12*512
h5 = self.maxpool4(h4)
hd5 = self.conv5(h5) # h5->6*6*1024
## -------------Decoder-------------
#use the steps above to concat the features of h1,h2,h3,h4,hd5 and makes a new cell hd4
#combines 5 matrix of 12*12*64 to a matrix 12*12*320
h1_PT_hd4 = self.h1_PT_hd4_relu(self.h1_PT_hd4_bn(self.h1_PT_hd4_conv(self.h1_PT_hd4(h1))))
h2_PT_hd4 = self.h2_PT_hd4_relu(self.h2_PT_hd4_bn(self.h2_PT_hd4_conv(self.h2_PT_hd4(h2))))
h3_PT_hd4 = self.h3_PT_hd4_relu(self.h3_PT_hd4_bn(self.h3_PT_hd4_conv(self.h3_PT_hd4(h3))))
h4_Cat_hd4 = self.h4_Cat_hd4_relu(self.h4_Cat_hd4_bn(self.h4_Cat_hd4_conv(h4)))
hd5_UT_hd4 = self.hd5_UT_hd4_relu(self.hd5_UT_hd4_bn(self.hd5_UT_hd4_conv(self.hd5_UT_hd4(hd5))))
hd4 = self.relu4d_1(self.bn4d_1(self.conv4d_1(
torch.cat((h1_PT_hd4, h2_PT_hd4, h3_PT_hd4, h4_Cat_hd4, hd5_UT_hd4), 1)))) # hd4->40*40*UpChannels
#use the steps above to concat the features of h1,h2,h3,hd4,hd5 and makes a new cell hd3
#combines 5 matrix of 24*24*64 to a matrix 24*24*320
h1_PT_hd3 = self.h1_PT_hd3_relu(self.h1_PT_hd3_bn(self.h1_PT_hd3_conv(self.h1_PT_hd3(h1))))
h2_PT_hd3 = self.h2_PT_hd3_relu(self.h2_PT_hd3_bn(self.h2_PT_hd3_conv(self.h2_PT_hd3(h2))))
h3_Cat_hd3 = self.h3_Cat_hd3_relu(self.h3_Cat_hd3_bn(self.h3_Cat_hd3_conv(h3)))
hd4_UT_hd3 = self.hd4_UT_hd3_relu(self.hd4_UT_hd3_bn(self.hd4_UT_hd3_conv(self.hd4_UT_hd3(hd4))))
hd5_UT_hd3 = self.hd5_UT_hd3_relu(self.hd5_UT_hd3_bn(self.hd5_UT_hd3_conv(self.hd5_UT_hd3(hd5))))
hd3 = self.relu3d_1(self.bn3d_1(self.conv3d_1(
torch.cat((h1_PT_hd3, h2_PT_hd3, h3_Cat_hd3, hd4_UT_hd3, hd5_UT_hd3), 1)))) # hd3->80*80*UpChannels
#use the steps above to concat the features of h1,h2,hd3,hd4,hd5 and makes a new cell hd2
#combines 5 matrix of 48*48*64 to a matrix 48*48*320
h1_PT_hd2 = self.h1_PT_hd2_relu(self.h1_PT_hd2_bn(self.h1_PT_hd2_conv(self.h1_PT_hd2(h1))))
h2_Cat_hd2 = self.h2_Cat_hd2_relu(self.h2_Cat_hd2_bn(self.h2_Cat_hd2_conv(h2)))
hd3_UT_hd2 = self.hd3_UT_hd2_relu(self.hd3_UT_hd2_bn(self.hd3_UT_hd2_conv(self.hd3_UT_hd2(hd3))))
hd4_UT_hd2 = self.hd4_UT_hd2_relu(self.hd4_UT_hd2_bn(self.hd4_UT_hd2_conv(self.hd4_UT_hd2(hd4))))
hd5_UT_hd2 = self.hd5_UT_hd2_relu(self.hd5_UT_hd2_bn(self.hd5_UT_hd2_conv(self.hd5_UT_hd2(hd5))))
hd2 = self.relu2d_1(self.bn2d_1(self.conv2d_1(
torch.cat((h1_PT_hd2, h2_Cat_hd2, hd3_UT_hd2, hd4_UT_hd2, hd5_UT_hd2), 1)))) # hd2->160*160*UpChannels
#use the steps above to concat the features of h1,hd2,hd3,hd4,hd5 and makes a new cell hd1
#combines 5 matrix of 96*96*64 to a matrix 96*96*320
h1_Cat_hd1 = self.h1_Cat_hd1_relu(self.h1_Cat_hd1_bn(self.h1_Cat_hd1_conv(h1)))
hd2_UT_hd1 = self.hd2_UT_hd1_relu(self.hd2_UT_hd1_bn(self.hd2_UT_hd1_conv(self.hd2_UT_hd1(hd2))))
hd3_UT_hd1 = self.hd3_UT_hd1_relu(self.hd3_UT_hd1_bn(self.hd3_UT_hd1_conv(self.hd3_UT_hd1(hd3))))
hd4_UT_hd1 = self.hd4_UT_hd1_relu(self.hd4_UT_hd1_bn(self.hd4_UT_hd1_conv(self.hd4_UT_hd1(hd4))))
hd5_UT_hd1 = self.hd5_UT_hd1_relu(self.hd5_UT_hd1_bn(self.hd5_UT_hd1_conv(self.hd5_UT_hd1(hd5))))
hd1 = self.relu1d_1(self.bn1d_1(self.conv1d_1(
torch.cat((h1_Cat_hd1, hd2_UT_hd1, hd3_UT_hd1, hd4_UT_hd1, hd5_UT_hd1), 1)))) # hd1->320*320*UpChannels
#output the final results
#96*96*320 -> 96*96*4
d1 = self.outconv1(hd1)
return d1
# UNet
class DoubleConv(nn.Module):
"""(convolution => [BN] => ReLU) * 2"""
#define a function contains two steps of convolution
def __init__(self, in_channels, out_channels, mid_channels=None):
super().__init__()
if mid_channels is None:
mid_channels = out_channels
self.double_conv = nn.Sequential(
nn.Conv2d(in_channels, mid_channels, kernel_size=3, padding=1),
nn.BatchNorm2d(mid_channels),
nn.ReLU(inplace = True),
nn.Conv2d(mid_channels, out_channels, kernel_size=3, padding=1),
nn.BatchNorm2d(out_channels),
nn.ReLU(inplace = True)
)
def forward(self, x):
return self.double_conv(x)
class Down(nn.Module):
def __init__(self, in_channels, out_channels):
super(Down,self).__init__()
# define a function contains two times of pooling and two steps of convolution
self.max_pool_conv = nn.Sequential(
nn.MaxPool2d(2),
DoubleConv(in_channels, out_channels)
)
def forward(self, x):
return self.max_pool_conv(x)
class Up(nn.Module):
def __init__(self, in_channels, out_channels, Transpose=False):
super(Up,self).__init__()
# Transpose is to do deconvolution but it is not used in this network
# use the bilinear to double the size of matrix and use the conv2d to minimize the channels to the half
if Transpose:
self.up = nn.ConvTranspose2d(in_channels, in_channels//2, 2, stride=2)
else:
self.up = nn.Sequential(nn.Upsample(scale_factor=2, mode='bilinear', align_corners=True),
nn.Conv2d(in_channels, in_channels // 2, kernel_size=1, padding=0),
nn.ReLU(inplace=True))
self.conv = DoubleConv(in_channels, out_channels)
# self.up.apply(self.init_weights)
def forward(self, x1, x2):
x1 = self.up(x1)
# it may not use padding but we write here in case.
diffY = x2.size()[2] - x1.size()[2]
diffX = x2.size()[3] - x1.size()[3]
x1 = nn.functional.pad(x1, (diffX // 2, diffX - diffX//2,
diffY // 2, diffY - diffY//2))
# concat the features of two graph
x = torch.cat([x2,x1], dim=1)
x = self.conv(x)
return x
# define a function of convolution to output the result
class out_conv(nn.Module):
def __init__(self,in_channels,out_channels):
super(out_conv,self).__init__()
self.conv = nn.Conv2d(in_channels,out_channels,kernel_size=1)
def forward(self,x):
return self.conv(x)
# define the unet
class Unet(nn.Module):
def __init__(self):
super(Unet,self).__init__()
self.in_channels = 1
self.inc = DoubleConv(self.in_channels,64)
self.down1 = Down(64, 128)
self.down2 = Down(128, 256)
self.down3 = Down(256, 512)
self.drop3 = nn.Dropout2d(0.05)
self.down4 = Down(512, 1024)
self.drop4 = nn.Dropout2d(0.05)
self.up1 = Up(1024, 512, False)
self.up2 = Up(512,256,False)
self.up3 = Up(256, 128, False)
self.up4 = Up(128, 64, False)
self.out = out_conv(64,4)
# self.optimizer = torch.optim.Adam(self.parameters(), lr=1e-4)
def forward(self,x):
self.x = x
# 96*96*1 -> 96*96*64
x1 = self.inc(self.x)
# 96*96*64->48*48*128
x2 = self.down1(x1)
# 48*48*128->24*24*256
x3 = self.down2(x2)
# 24*24*256->12*12*512
x4 = self.down3(x3)
x4 = self.drop3(x4)
# in the original unet network, it has depth of 5.But in this work,
# we will use the l3 version of this network and reduce x5 because the size of picture is not big
# we find reducing the x5 will make the result better
# x5 = self.down4(x4)
# x5 = self.drop4(x5)
# x = self.up1(x5, x4)
# 12*12*512->24*24*512
# x4 becomes 24*24*256 and merge the channels with x3
x = self.up2(x4, x3)
# 24*24*512->48*48*512
# x becomes 48*48*256 and merge the channels with x2
x = self.up3(x, x2)
# 48*48*512->96*96*512
# x becomes 48*48*512 and merge the channel with x1
x = self.up4(x, x1)
# 96*96*512->96*96*4
x = self.out(x)
return x
# Dice Score
def categorical_dice(mask1, mask2, label_class=1):
"""
Dice score of a specified class between two volumes of label masks.
(classes are encoded but by label class number not one-hot )
Note: stacks of 2D slices are considered volumes.
Args:
mask1: N label masks, numpy array shaped (H, W, N)
mask2: N label masks, numpy array shaped (H, W, N)
label_class: the class over which to calculate dice scores
Returns:
volume_dice
"""
mask1_pos = (mask1 == label_class).astype(np.float32)
mask2_pos = (mask2 == label_class).astype(np.float32)
dice = 2 * np.sum(mask1_pos * mask2_pos) / (np.sum(mask1_pos) + np.sum(mask2_pos))
return dice
# Define the Loss function
class losses():
def __init__(self):
self.CE= nn.CrossEntropyLoss()
# CrossEntropyLoss
def forward(self, pred, true):
return self.CE(pred,true)
# Self-define Loss function: CrossEntropyLoss + Dice Score
def forward2(self,pred,true,pred_mask,true_mask):
self.val = 0
for i in range(3):
self.val = self.val + categorical_dice(pred_mask.data.cpu().numpy(), true_mask.data.cpu().numpy(), i+1)
return self.CE(pred,true) + (3 - self.val) / 2
# Function: optimize the learning rate
def exp_lr_scheduler(optimizer, epoch, init_lr=0.001, lr_decay_epoch=15):
"""Decay learning rate by a factor of 0.1 every lr_decay_epoch epochs."""
lr = init_lr * (0.5 ** (epoch // lr_decay_epoch))
if epoch % lr_decay_epoch == 0:
print('LR is set to {}'.format(lr))
for param_group in optimizer.param_groups:
param_group['lr'] = lr
return optimizer
# Train Unet
np.seterr(invalid='ignore') # ignore warnings
model = Unet().to(device)
learning_rate = 0.001
weight_decay = 1e-4
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate, weight_decay=weight_decay)
# Initialize the training dataset
data_path = './data/train'
num_workers = 1
batch_size = 4
train_set = TrainDataset(data_path)
training_data_loader = DataLoader(dataset=train_set, num_workers=num_workers, batch_size=batch_size, shuffle=True)
# Initialize the validting dataset
val_data_path = './data/val'
val_num_workers = 1
val_batch_size = 4
val_set = TrainDataset(val_data_path)
val_data_loader = DataLoader(dataset=val_set, num_workers=val_num_workers, batch_size=val_batch_size, shuffle=False)
# Initialize the valuses
running_loss = 0.0
my_loss = losses()
Max_val = 100000000000
train_losses = []
val_losses = []
plot_train_dice = []
plot_val_dice = []
# Initialize the timer
T1 = time.time()
# Epoch
num_epoches = 50
# Fetch images and labels.
for epoch in range(num_epoches):
# Optimizer
optimizer = exp_lr_scheduler(optimizer, epoch)
# Initialize the values
total_loss = 0
val_total_loss = 0
train_dice = 0
train_dice_cnt = 0
val_dice = 0
val_dice_cnt = 0
# For-loop on the training set
for iteration, sample in enumerate(training_data_loader):
# Initialize the images and labels
img, mask = sample
img = img.to(device)
mask = mask.to(device)
# Adjust the dimension
img = img.unsqueeze(1)
# Gradient -> 0
optimizer.zero_grad()
# Output
outputs = model(img)
output = outputs.argmax(dim=1)
for i in range(batch_size):
output_dice = output[i]
output_dice.unsqueeze(1)
dices = []
for k in range(1, 4):
dices.append(categorical_dice(output_dice.data.cpu().numpy(), mask[i].data.cpu().numpy(), label_class = k))
# compute mean dice score without nan produced when both masks don't contain label_class i
dice_score = np.nanmean(np.array(dices))
train_dice += dice_score
train_dice_cnt = train_dice_cnt + 1
# Loss and the total
loss = my_loss.forward2(outputs,mask.long(),mask, output)
total_loss += loss
loss.backward()
# Update w
optimizer.step()
# After 1 for-loop on the training dataset
# Average Loss
train_losses.append(total_loss.item() / 100)
plot_train_dice.append(train_dice / train_dice_cnt)
print("Epoch %d, average loss: %f" % (epoch + 1, total_loss / 100))
# For-loop on the validating set
for iteration, sample in enumerate(val_data_loader):
# Initialize the images and labels
img, mask = sample
img = img.to(device)
mask = mask.to(device)
# Adjust the dimension
img = img.unsqueeze(1)
# Context-manager
with torch.no_grad():
outputs = model(img)
output = outputs.argmax(dim = 1)
for i in range(batch_size):
output_dice = output[i]
output_dice.unsqueeze(1)
dices = []
for k in range(1, 4):
dices.append(categorical_dice(output_dice.data.cpu().numpy(), mask[i].data.cpu().numpy(), label_class = k))
# compute mean dice score without nan produced when both masks don't contain label_class i
dice_score = np.nanmean(np.array(dices))
val_dice += dice_score
val_dice_cnt = val_dice_cnt + 1
# Loss and the total
loss = my_loss.forward2(outputs, mask.long(),mask,output)
val_total_loss += loss
# After 1 for-loop on the validating dataset
# Average Loss
val_losses.append(val_total_loss.item() / 20)
plot_val_dice.append(val_dice / val_dice_cnt)
# Upper boundary for the validating Loss
if Max_val > val_total_loss:
Max_val = val_total_loss
print(epoch, " saved model")
PATH = './Unet_01.pth'
torch.save(model.state_dict(), PATH)
# Output the Loss
print("validation average loss: %f" % (val_total_loss / 20))
# After this epoch
# Output the timer
T2 = time.time()
print('Using time: %fs' % (T2 - T1))
print('Training time per epoch: %.3f s' % ((T2 - T1)/(epoch + 1)))
# Output the Loss
# print(len(train_losses))
# print(len(val_losses))
# Plot the Loss
plt.plot(range(num_epoches), train_losses,c = 'r',label = "train_losses")
plt.plot(range(num_epoches), val_losses,c = 'y',label = "val_losses")
plt.legend(loc = 'best')
plt.show()
# Plot the Dice Score
plt.plot(range(num_epoches), plot_train_dice,c = 'r',label = "train_dice_score")
plt.plot(range(num_epoches), plot_val_dice,c = 'y',label = "val_dice_score")
plt.legend(loc = 'best')
plt.show()
LR is set to 0.001 Epoch 1, average loss: 0.459672 0 saved model validation average loss: 0.329990 Epoch 2, average loss: 0.253035 1 saved model validation average loss: 0.223149 Epoch 3, average loss: 0.195287 2 saved model validation average loss: 0.194693 Epoch 4, average loss: 0.179049 3 saved model validation average loss: 0.193236 Epoch 5, average loss: 0.145511 4 saved model validation average loss: 0.158362 Epoch 6, average loss: 0.121022 5 saved model validation average loss: 0.131897 Epoch 7, average loss: 0.122300 6 saved model validation average loss: 0.106578 Epoch 8, average loss: 0.110124 validation average loss: 0.124797 Epoch 9, average loss: 0.097674 validation average loss: 0.121700 Epoch 10, average loss: 0.102372 9 saved model validation average loss: 0.103962 Epoch 11, average loss: 0.100813 validation average loss: 0.143839 Epoch 12, average loss: 0.095388 validation average loss: 0.104325 Epoch 13, average loss: 0.089745 validation average loss: 0.106784 Epoch 14, average loss: 0.081929 13 saved model validation average loss: 0.089734 Epoch 15, average loss: 0.069288 14 saved model validation average loss: 0.082610 LR is set to 0.0005 Epoch 16, average loss: 0.059912 validation average loss: 0.084576 Epoch 17, average loss: 0.060850 16 saved model validation average loss: 0.074422 Epoch 18, average loss: 0.057740 17 saved model validation average loss: 0.071654 Epoch 19, average loss: 0.052044 validation average loss: 0.073524 Epoch 20, average loss: 0.054089 19 saved model validation average loss: 0.070292 Epoch 21, average loss: 0.052416 20 saved model validation average loss: 0.069903 Epoch 22, average loss: 0.049471 21 saved model validation average loss: 0.068223 Epoch 23, average loss: 0.055096 validation average loss: 0.080117 Epoch 24, average loss: 0.046766 validation average loss: 0.071065 Epoch 25, average loss: 0.051937 validation average loss: 0.074814 Epoch 26, average loss: 0.050402 validation average loss: 0.084648 Epoch 27, average loss: 0.056260 validation average loss: 0.075572 Epoch 28, average loss: 0.065385 validation average loss: 0.074614 Epoch 29, average loss: 0.054966 validation average loss: 0.073986 Epoch 30, average loss: 0.053932 validation average loss: 0.071617 LR is set to 0.00025 Epoch 31, average loss: 0.048237 30 saved model validation average loss: 0.067426 Epoch 32, average loss: 0.043294 31 saved model validation average loss: 0.063475 Epoch 33, average loss: 0.044307 32 saved model validation average loss: 0.059879 Epoch 34, average loss: 0.042146 33 saved model validation average loss: 0.059638 Epoch 35, average loss: 0.042247 validation average loss: 0.061419 Epoch 36, average loss: 0.041485 35 saved model validation average loss: 0.054637 Epoch 37, average loss: 0.040213 36 saved model validation average loss: 0.053995 Epoch 38, average loss: 0.038486 validation average loss: 0.057240 Epoch 39, average loss: 0.036941 validation average loss: 0.055789 Epoch 40, average loss: 0.037708 validation average loss: 0.055471 Epoch 41, average loss: 0.036073 40 saved model validation average loss: 0.053745 Epoch 42, average loss: 0.036919 validation average loss: 0.055776 Epoch 43, average loss: 0.035487 validation average loss: 0.055825 Epoch 44, average loss: 0.036720 validation average loss: 0.055589 Epoch 45, average loss: 0.034620 validation average loss: 0.054576 LR is set to 0.000125 Epoch 46, average loss: 0.034982 validation average loss: 0.054879 Epoch 47, average loss: 0.033646 46 saved model validation average loss: 0.053553 Epoch 48, average loss: 0.033618 validation average loss: 0.054699 Epoch 49, average loss: 0.034527 validation average loss: 0.054098 Epoch 50, average loss: 0.033825 49 saved model validation average loss: 0.050508 Using time: 67.138854s Training time per epoch: 1.343 s
# Train Unet3plus
np.seterr(invalid='ignore') # ignore warnings
model = UNet_3Plus().to(device)
learning_rate = 0.001
weight_decay = 1e-4
optimizer = torch.optim.Adam(model.parameters(), lr = learning_rate, weight_decay = weight_decay)
# Initialize the training dataset
data_path = './data/train'
num_workers = 1
batch_size = 4
train_set = TrainDataset(data_path)
training_data_loader = DataLoader(dataset=train_set, num_workers=num_workers, batch_size=batch_size, shuffle=True)
# Initialize the validting dataset
val_data_path = './data/val'
val_num_workers = 1
val_batch_size = 4
val_set = TrainDataset(val_data_path)
val_data_loader = DataLoader(dataset=val_set, num_workers=val_num_workers, batch_size=val_batch_size, shuffle=False)
# Initialize the valuses
running_loss = 0.0
my_loss = losses()
Max_val = 100000000000
train_losses = []
val_losses = []
plot_train_dice = []
plot_val_dice = []
# Initialize the timer
T1 = time.time()
# Epoch
num_epoches = 50
# Fetch images and labels.
for epoch in range(num_epoches):
# Optimizer
optimizer = exp_lr_scheduler(optimizer, epoch)
# Initialize the values
total_loss = 0
val_total_loss = 0
train_dice = 0
train_dice_cnt = 0
val_dice = 0
val_dice_cnt = 0
# For-loop on the training set
for iteration, sample in enumerate(training_data_loader):
# Initialize the images and labels
img, mask = sample
img = img.to(device)
mask = mask.to(device)
# Adjust the dimension
img = img.unsqueeze(1)
# Gradient -> 0
optimizer.zero_grad()
# Output
outputs = model(img)
output = outputs.argmax(dim = 1)
for i in range(batch_size):
output_dice = output[i]
output_dice.unsqueeze(1)
dices = []
for k in range(1, 4):
dices.append(categorical_dice(output_dice.data.cpu().numpy(), mask[i].data.cpu().numpy(), label_class = k))
# compute mean dice score without nan produced when both masks don't contain label_class i
dice_score = np.nanmean(np.array(dices))
train_dice += dice_score
train_dice_cnt = train_dice_cnt + 1
# Loss and the total
loss = my_loss.forward2(outputs,mask.long(),mask, output)
total_loss += loss
loss.backward()
# Update w
optimizer.step()
# After 1 for-loop on the training dataset
# Average Loss
train_losses.append(total_loss.item() / 100)
plot_train_dice.append(train_dice / train_dice_cnt)
print("Epoch %d, average loss: %f" % (epoch + 1, total_loss / 100))
# For-loop on the validating set
for iteration, sample in enumerate(val_data_loader):
# Initialize the images and labels
img, mask = sample
img = img.to(device)
mask = mask.to(device)
# Adjust the dimension
img = img.unsqueeze(1)
# Context-manager
with torch.no_grad():
outputs = model(img)
output = outputs.argmax(dim = 1)
for i in range(batch_size):
output_dice = output[i]
output_dice.unsqueeze(1)
dices = []
for k in range(1, 4):
dices.append(categorical_dice(output_dice.data.cpu().numpy(), mask[i].data.cpu().numpy(), label_class = k))
# compute mean dice score without nan produced when both masks don't contain label_class i
dice_score = np.nanmean(np.array(dices))
val_dice += dice_score
val_dice_cnt = val_dice_cnt + 1
# Loss and the total
loss = my_loss.forward2(outputs, mask.long(),mask,output)
val_total_loss += loss
# After 1 for-loop on the validating dataset
# Average Loss
val_losses.append(val_total_loss.item() / 20)
plot_val_dice.append(val_dice / val_dice_cnt)
# Upper boundary for the validating Loss
if Max_val > val_total_loss:
Max_val = val_total_loss
print(epoch, " saved model")
PATH = './Unet3+_01.pth'
torch.save(model.state_dict(), PATH)
# Output the Loss
print("validation average loss: %f" % (val_total_loss / 20))
# After this epoch
# Output the timer
T2 = time.time()
print('Using time: %fs' % (T2 - T1))
print('Training time per epoch: %.3f s' % ((T2 - T1)/(epoch + 1)))
# Output the Loss
# print(len(train_losses))
# print(len(val_losses))
# Plot the Loss
plt.plot(range(num_epoches), train_losses,c='r',label = "train_losses")
plt.plot(range(num_epoches), val_losses,c='y',label = "val_losses")
plt.legend(loc='best')
plt.show()
# Plot the Dice Score
plt.plot(range(num_epoches), plot_train_dice,c='r',label = "train_dice_score")
plt.plot(range(num_epoches), plot_val_dice,c='y',label = "val_dice_score")
plt.legend(loc='best')
plt.show()
LR is set to 0.001
/usr/local/lib64/python3.7/site-packages/torch/nn/functional.py:3635: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details. "See the documentation of nn.Upsample for details.".format(mode)
Epoch 1, average loss: 0.318908 0 saved model validation average loss: 0.198778 Epoch 2, average loss: 0.174102 1 saved model validation average loss: 0.169673 Epoch 3, average loss: 0.137647 2 saved model validation average loss: 0.145644 Epoch 4, average loss: 0.119870 3 saved model validation average loss: 0.133171 Epoch 5, average loss: 0.112009 validation average loss: 0.140130 Epoch 6, average loss: 0.102306 5 saved model validation average loss: 0.114133 Epoch 7, average loss: 0.092045 validation average loss: 0.125563 Epoch 8, average loss: 0.077430 7 saved model validation average loss: 0.104993 Epoch 9, average loss: 0.076501 8 saved model validation average loss: 0.091840 Epoch 10, average loss: 0.069317 9 saved model validation average loss: 0.082105 Epoch 11, average loss: 0.063094 validation average loss: 0.092037 Epoch 12, average loss: 0.059135 11 saved model validation average loss: 0.078227 Epoch 13, average loss: 0.051121 12 saved model validation average loss: 0.072562 Epoch 14, average loss: 0.047594 13 saved model validation average loss: 0.069744 Epoch 15, average loss: 0.049554 validation average loss: 0.072882 LR is set to 0.0005 Epoch 16, average loss: 0.046653 15 saved model validation average loss: 0.065500 Epoch 17, average loss: 0.042015 16 saved model validation average loss: 0.063587 Epoch 18, average loss: 0.042106 validation average loss: 0.070367 Epoch 19, average loss: 0.039020 validation average loss: 0.065776 Epoch 20, average loss: 0.037959 19 saved model validation average loss: 0.062026 Epoch 21, average loss: 0.037126 20 saved model validation average loss: 0.061478 Epoch 22, average loss: 0.037155 21 saved model validation average loss: 0.060984 Epoch 23, average loss: 0.039788 validation average loss: 0.063795 Epoch 24, average loss: 0.034923 validation average loss: 0.064392 Epoch 25, average loss: 0.035728 validation average loss: 0.062114 Epoch 26, average loss: 0.036878 validation average loss: 0.066561 Epoch 27, average loss: 0.036949 26 saved model validation average loss: 0.060824 Epoch 28, average loss: 0.038787 validation average loss: 0.063201 Epoch 29, average loss: 0.043902 validation average loss: 0.065341 Epoch 30, average loss: 0.037339 validation average loss: 0.061528 LR is set to 0.00025 Epoch 31, average loss: 0.032480 30 saved model validation average loss: 0.057644 Epoch 32, average loss: 0.031600 31 saved model validation average loss: 0.055245 Epoch 33, average loss: 0.030505 validation average loss: 0.058239 Epoch 34, average loss: 0.030820 33 saved model validation average loss: 0.054914 Epoch 35, average loss: 0.027805 validation average loss: 0.055998 Epoch 36, average loss: 0.026989 35 saved model validation average loss: 0.051842 Epoch 37, average loss: 0.027374 validation average loss: 0.052827 Epoch 38, average loss: 0.027172 validation average loss: 0.056404 Epoch 39, average loss: 0.026774 validation average loss: 0.055845 Epoch 40, average loss: 0.025606 validation average loss: 0.055165 Epoch 41, average loss: 0.026358 validation average loss: 0.058811 Epoch 42, average loss: 0.026261 validation average loss: 0.061893 Epoch 43, average loss: 0.026486 validation average loss: 0.057104 Epoch 44, average loss: 0.025823 validation average loss: 0.058801 Epoch 45, average loss: 0.025533 validation average loss: 0.054437 LR is set to 0.000125 Epoch 46, average loss: 0.024072 validation average loss: 0.052466 Epoch 47, average loss: 0.022315 validation average loss: 0.055846 Epoch 48, average loss: 0.021470 validation average loss: 0.057224 Epoch 49, average loss: 0.020983 validation average loss: 0.056521 Epoch 50, average loss: 0.021675 validation average loss: 0.055436 Using time: 290.551899s Training time per epoch: 5.811 s
# UNet_3Plus()
model_UNet3plus = UNet_3Plus().to(device)
PATH = './Unet3+_01.pth'
model_UNet3plus_Max = torch.load(PATH, map_location='cpu')
model_UNet3plus.load_state_dict(model_UNet3plus_Max)
model_UNet3plus.to(device)
# UNet()
model_UNet = Unet().to(device)
PATH2 = './Unet_01.pth'
model_UNet_Max = torch.load(PATH2, map_location='cpu')
model_UNet.load_state_dict(model_UNet_Max)
model_UNet.to(device)
Unet(
(inc): DoubleConv(
(double_conv): Sequential(
(0): Conv2d(1, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace=True)
)
)
(down1): Down(
(max_pool_conv): Sequential(
(0): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(1): DoubleConv(
(double_conv): Sequential(
(0): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace=True)
)
)
)
)
(down2): Down(
(max_pool_conv): Sequential(
(0): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(1): DoubleConv(
(double_conv): Sequential(
(0): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace=True)
)
)
)
)
(down3): Down(
(max_pool_conv): Sequential(
(0): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(1): DoubleConv(
(double_conv): Sequential(
(0): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace=True)
)
)
)
)
(drop3): Dropout2d(p=0.05, inplace=False)
(down4): Down(
(max_pool_conv): Sequential(
(0): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(1): DoubleConv(
(double_conv): Sequential(
(0): Conv2d(512, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace=True)
)
)
)
)
(drop4): Dropout2d(p=0.05, inplace=False)
(up1): Up(
(up): Sequential(
(0): Upsample(scale_factor=2.0, mode=bilinear)
(1): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1))
(2): ReLU(inplace=True)
)
(conv): DoubleConv(
(double_conv): Sequential(
(0): Conv2d(1024, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace=True)
)
)
)
(up2): Up(
(up): Sequential(
(0): Upsample(scale_factor=2.0, mode=bilinear)
(1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))
(2): ReLU(inplace=True)
)
(conv): DoubleConv(
(double_conv): Sequential(
(0): Conv2d(512, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace=True)
)
)
)
(up3): Up(
(up): Sequential(
(0): Upsample(scale_factor=2.0, mode=bilinear)
(1): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1))
(2): ReLU(inplace=True)
)
(conv): DoubleConv(
(double_conv): Sequential(
(0): Conv2d(256, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace=True)
)
)
)
(up4): Up(
(up): Sequential(
(0): Upsample(scale_factor=2.0, mode=bilinear)
(1): Conv2d(128, 64, kernel_size=(1, 1), stride=(1, 1))
(2): ReLU(inplace=True)
)
(conv): DoubleConv(
(double_conv): Sequential(
(0): Conv2d(128, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace=True)
)
)
)
(out): out_conv(
(conv): Conv2d(64, 4, kernel_size=(1, 1), stride=(1, 1))
)
)
# Define the function of confusion matrix
def plot_confusion_matrix(cm, classes,
normalize = False,
title = 'Confusion matrix',
cmap = plt.cm.Blues):
"""
This function prints and plots the confusion matrix.
Normalization can be applied by setting `normalize = True`.
"""
plt.imshow(cm, interpolation = 'nearest', cmap = cmap)
plt.title(title)
plt.colorbar()
tick_marks = np.arange(len(classes))
plt.xticks(tick_marks, classes, rotation = 45)
plt.yticks(tick_marks, classes)
if normalize:
cm = cm.astype('float') / cm.sum(axis = 1)[:, np.newaxis]
print("Normalized confusion matrix")
else:
print('Confusion matrix, without normalization')
print(cm)
thresh = cm.max() / 2.
for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
plt.text(j, i, cm[i, j],
horizontalalignment = "center",
color = "white" if cm[i, j] > thresh else "black")
plt.tight_layout()
plt.ylabel('True label')
plt.xlabel('Predicted label')
# Initialization
y_mmpre = []
y_mmlabel = []
# Initialize the validating dataset
val_data_path = './data/val'
val_num_workers = 1
val_batch_size = 1
val_set = TrainDataset(val_data_path)
val_data_loader = DataLoader(dataset = val_set, num_workers = val_num_workers, batch_size = val_batch_size, shuffle = True)
# Initialize the Loss
val_total_loss = 0
w = 0.74
# For-loop on the validating dataset
for iteration, sample in enumerate(val_data_loader):
# Run the model
img, mask = sample
img = img.to(device)
mask = mask.to(device)
img = img.unsqueeze(1)
with torch.no_grad():
output = model_UNet3plus(img)
output2 = model_UNet(img)
# Mix the model
output = output * w + output2 * (1 - w)
output2 = output.argmax(dim = 1)
# Convert the outputs and labels to a 1-dimensional vector
for i in range(96):
for j in range(96):
y_mmpre.append(int(output2[0][i][j].item()))
y_mmlabel.append(int(mask[0][i][j].item()))
# Calculate loss
loss = my_loss.forward2(output, mask.long(),mask,output)
val_total_loss += loss
# Initialize confusion matrix
cm = np.zeros((4, 4))
cnt = [0, 0, 0, 0]
# Calculate the number of classes in labels and predictions
for i in range(len(y_mmlabel)):
cnt[y_mmlabel[i]] = cnt[y_mmlabel[i]] + 1
for i in range(len(y_mmpre)):
cm[y_mmpre[i]][y_mmlabel[i]] = cm[y_mmpre[i]][y_mmlabel[i]] + 1
for i in range(4):
for j in range(4):
cm[i][j] = cm[i][j] / cnt[j]
cm[i][j] = round(cm[i][j], 2) # Take two decimal places
# Define Class name
class_names = ['background', 'RV', 'Myo', 'Lv']
# Plot the non-normalized confusion matrix
plt.figure()
plot_confusion_matrix(cm, classes=class_names, normalize=False,
title='Confusion matrix')
plt.show()
# Output the Loss
print("validation loss: %f" % (val_total_loss / 20))
Confusion matrix, without normalization [[0.99 0.12 0.04 0. ] [0. 0.86 0.02 0. ] [0. 0.03 0.91 0.06] [0. 0. 0.03 0.94]]
validation loss: 1.561914
# Calculate the Dice Score
dice = 0
for i, sample in enumerate(val_data_loader):
imgList, maskList = sample
for i in range(imgList.shape[0]):
img = imgList[i].to(device)
mask = maskList[i].to(device)
# Run the model
output = model_UNet3plus(img.unsqueeze(0).unsqueeze(0))
output2 = model_UNet(img.unsqueeze(0).unsqueeze(0))
# Mix the model
output = output * w + output2 * (1 - w)
output = output.argmax(dim = 1)
plt.figure(figsize=(5,5))
plt.subplot(1, 3, 1)
plt.imshow(img.to('cpu'), cmap = 'gray')
plt.axis('off')
plt.subplot(1, 3, 2)
plt.imshow(mask.to('cpu'), cmap = 'gray')
plt.axis('off')
plt.subplot(1, 3, 3)
plt.imshow(output.squeeze().to('cpu'), cmap = 'gray')
plt.axis('off')
#plt.show()
mask.to('cpu')
output.to('cpu')
dice_score = categorical_dice(output.data.cpu().numpy(), mask.data.cpu().numpy(),1)
dice_score += categorical_dice(output.data.cpu().numpy(), mask.data.cpu().numpy(),2)
dice_score += categorical_dice(output.data.cpu().numpy(), mask.data.cpu().numpy(),3)
dice_score /= 3
dice += dice_score
dice /= 20
print("ave val dice:",dice)
ave val dice: 0.8697095580483929
# Initialize the testing dataset
test_data_path = './data/test'
test_set = TestDataset(test_data_path)
test_data_loader = DataLoader(dataset=test_set, num_workers=0, batch_size=4, shuffle=False)
# For-loop on the testing dataset
for j, sample in enumerate(test_data_loader):
imgList = sample
for i in range(imgList.shape[0]):
# Run the model
img = imgList[i].to(device)
# Run the model
output = model_UNet3plus(img.unsqueeze(0).unsqueeze(0))
output2 = model_UNet(img.unsqueeze(0).unsqueeze(0))
# Mix the model
output = output * w + output2 * (1 - w)
output = output.argmax(dim = 1)
# Plot the pictures
plt.figure(figsize=(5, 5))
plt.subplot(1, 2, 1)
plt.imshow(img.to('cpu'), cmap = 'gray')
plt.axis('off')
plt.subplot(1, 2, 2)
plt.imshow(output.squeeze().to('cpu'), cmap = 'gray')
plt.axis('off')
pictureName = "cmr" + str(121 + j * 4 + i) + "_mask.png"
print(pictureName)
# Save the pictures
cv2.imwrite(os.path.join('./data/test/mask',pictureName), output.cpu().numpy().squeeze(0).astype('uint8'))
plt.show()
cmr121_mask.png
cmr122_mask.png
cmr123_mask.png
cmr124_mask.png
cmr125_mask.png
cmr126_mask.png
cmr127_mask.png
cmr128_mask.png
cmr129_mask.png
cmr130_mask.png
cmr131_mask.png
cmr132_mask.png
cmr133_mask.png
cmr134_mask.png
cmr135_mask.png
cmr136_mask.png
cmr137_mask.png
cmr138_mask.png
cmr139_mask.png
cmr140_mask.png
cmr141_mask.png
cmr142_mask.png
cmr143_mask.png
cmr144_mask.png
cmr145_mask.png
cmr146_mask.png
cmr147_mask.png
cmr148_mask.png
cmr149_mask.png
cmr150_mask.png
cmr151_mask.png
cmr152_mask.png
cmr153_mask.png
cmr154_mask.png
cmr155_mask.png
cmr156_mask.png
cmr157_mask.png
cmr158_mask.png
cmr159_mask.png
cmr160_mask.png
cmr161_mask.png
cmr162_mask.png
cmr163_mask.png
cmr164_mask.png
cmr165_mask.png
cmr166_mask.png
cmr167_mask.png
cmr168_mask.png
cmr169_mask.png
cmr170_mask.png
cmr171_mask.png
cmr172_mask.png
cmr173_mask.png
cmr174_mask.png
cmr175_mask.png
cmr176_mask.png
cmr177_mask.png
cmr178_mask.png
cmr179_mask.png
cmr180_mask.png
cmr181_mask.png
cmr182_mask.png
cmr183_mask.png
cmr184_mask.png
cmr185_mask.png
cmr186_mask.png
cmr187_mask.png
cmr188_mask.png
cmr189_mask.png
cmr190_mask.png
cmr191_mask.png
cmr192_mask.png
cmr193_mask.png
cmr194_mask.png
cmr195_mask.png
cmr196_mask.png
cmr197_mask.png
cmr198_mask.png
cmr199_mask.png
cmr200_mask.png
def rle_encoding(x):
'''
*** Credit to https://www.kaggle.com/rakhlin/fast-run-length-encoding-python ***
x: numpy array of shape (height, width), 1 - mask, 0 - background
Returns run length as list
'''
dots = np.where(x.T.flatten() == 1)[0]
run_lengths = []
prev = -2
for b in dots:
if (b > prev + 1): run_lengths.extend((b + 1, 0))
run_lengths[-1] += 1
prev = b
return run_lengths
def submission_converter(mask_directory, path_to_save):
writer = open(os.path.join(path_to_save, "submission.csv"), 'w')
writer.write('id,encoding\n')
files = os.listdir(mask_directory)
for file in files:
name = file[:-4]
mask = cv2.imread(os.path.join(mask_directory, file), cv2.IMREAD_UNCHANGED)
mask1 = (mask == 1)
mask2 = (mask == 2)
mask3 = (mask == 3)
encoded_mask1 = rle_encoding(mask1)
encoded_mask1 = ' '.join(str(e) for e in encoded_mask1)
encoded_mask2 = rle_encoding(mask2)
encoded_mask2 = ' '.join(str(e) for e in encoded_mask2)
encoded_mask3 = rle_encoding(mask3)
encoded_mask3 = ' '.join(str(e) for e in encoded_mask3)
writer.write(name + '1,' + encoded_mask1 + "\n")
writer.write(name + '2,' + encoded_mask2 + "\n")
writer.write(name + '3,' + encoded_mask3 + "\n")
writer.close()
mask_directory = './data/test/mask'
save = './data/test'
submission_converter(mask_directory,save)